85 research outputs found
Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting
Currently, single image inpainting has achieved promising results based on
deep convolutional neural networks. However, inpainting on stereo images with
missing regions has not been explored thoroughly, which is also a significant
but different problem. One crucial requirement for stereo image inpainting is
stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross
Guidance Network (IGGNet). The IGGNet contains two key ingredients, i.e., a
Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG)
strategy. The GAA module relies on the epipolar geometry cues and learns the
geometry-aware guidance from one view to another, which is beneficial to make
the corresponding regions in two views consistent. However, learning guidance
from co-existing missing regions is challenging. To address this issue, the ICG
strategy is proposed, which can alternately narrow down the missing regions of
the two views in an iterative manner. Experimental results demonstrate that our
proposed network outperforms the latest stereo image inpainting model and
state-of-the-art single image inpainting models.Comment: Accepted by IJCAI 202
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
Translating images from a source domain to a target domain for learning
target models is one of the most common strategies in domain adaptive semantic
segmentation (DASS). However, existing methods still struggle to preserve
semantically-consistent local details between the original and translated
images. In this work, we present an innovative approach that addresses this
challenge by using source-domain labels as explicit guidance during image
translation. Concretely, we formulate cross-domain image translation as a
denoising diffusion process and utilize a novel Semantic Gradient Guidance
(SGG) method to constrain the translation process, conditioning it on the
pixel-wise source labels. Additionally, a Progressive Translation Learning
(PTL) strategy is devised to enable the SGG method to work reliably across
domains with large gaps. Extensive experiments demonstrate the superiority of
our approach over state-of-the-art methods.Comment: Accepted to ICCV202
Long-Term Anticipation of Activities with Cycle Consistency
With the success of deep learning methods in analyzing activities in videos,
more attention has recently been focused towards anticipating future
activities. However, most of the work on anticipation either analyzes a
partially observed activity or predicts the next action class. Recently, new
approaches have been proposed to extend the prediction horizon up to several
minutes in the future and that anticipate a sequence of future activities
including their durations. While these works decouple the semantic
interpretation of the observed sequence from the anticipation task, we propose
a framework for anticipating future activities directly from the features of
the observed frames and train it in an end-to-end fashion. Furthermore, we
introduce a cycle consistency loss over time by predicting the past activities
given the predicted future. Our framework achieves state-of-the-art results on
two datasets: the Breakfast dataset and 50Salads.Comment: GCPR 202
Learning Latent Global Network for Skeleton-based Action Prediction
Human actions represented with 3D skeleton sequences are robust to clustered backgrounds and illumination changes. In this paper, we investigate skeleton-based action prediction, which aims to recognize an action from a partial skeleton sequence that contains incomplete action information. We propose a new Latent Global Network based on adversarial learning for action prediction. We demonstrate that the proposed network provides latent long-term global information that is complementary to the local action information of the partial sequences and helps improve action prediction. We show that action prediction can be improved by combining the latent global information with the local action information. We test the proposed method on three challenging skeleton datasets and report state-of-the-art performance
ERA: Expert Retrieval and Assembly for Early Action Prediction
Early action prediction aims to successfully predict the class label of an
action before it is completely performed. This is a challenging task because
the beginning stages of different actions can be very similar, with only minor
subtle differences for discrimination. In this paper, we propose a novel Expert
Retrieval and Assembly (ERA) module that retrieves and assembles a set of
experts most specialized at using discriminative subtle differences, to
distinguish an input sample from other highly similar samples. To encourage our
model to effectively use subtle differences for early action prediction, we
push experts to discriminate exclusively between samples that are highly
similar, forcing these experts to learn to use subtle differences that exist
between those samples. Additionally, we design an effective Expert Learning
Rate Optimization method that balances the experts' optimization and leads to
better performance. We evaluate our ERA module on four public action datasets
and achieve state-of-the-art performance.Comment: Accepted to ECCV 202
DiffPose: Toward More Reliable 3D Pose Estimation
Monocular 3D human pose estimation is quite challenging due to the inherent
ambiguity and occlusion, which often lead to high uncertainty and
indeterminacy. On the other hand, diffusion models have recently emerged as an
effective tool for generating high-quality images from noise. Inspired by their
capability, we explore a novel pose estimation framework (DiffPose) that
formulates 3D pose estimation as a reverse diffusion process. We incorporate
novel designs into our DiffPose to facilitate the diffusion process for 3D pose
estimation: a pose-specific initialization of pose uncertainty distributions, a
Gaussian Mixture Model-based forward diffusion process, and a
context-conditioned reverse diffusion process. Our proposed DiffPose
significantly outperforms existing methods on the widely used pose estimation
benchmarks Human3.6M and MPI-INF-3DHP. Project page:
https://gongjia0208.github.io/Diffpose/.Comment: Accepted to CVPR 202
Meta Agent Teaming Active Learning for Pose Estimation
The existing pose estimation approaches often require a large number of annotated images to attain good estimation performance, which are laborious to acquire. To reduce the human efforts on pose annotations, we propose a novel Meta Agent Teaming Active Learning (MATAL) framework to actively select and label informative images for effective learning. Our MATAL formulates the image selection procedure as a Markov Decision Process and learns an optimal sampling policy that directly maximizes the performance of the pose estimator based on the reward. Our framework consists of a novel state-action representation as well as a multi-agent team to enable batch sampling in the active learning procedure. The framework could be effectively optimized via Meta-Optimization to accelerate the adaptation to the gradually expanded labeled data during deployment. Finally, we show experimental results on both human hand and body pose estimation benchmark datasets and demonstrate that our method significantly outperforms all baselines continuously under the same amount of annotation budget. Moreover, to obtain similar pose estimation accuracy, our MATAL framework can save around 40% labeling efforts on average compared to state-of-the-art active learning frameworks
- …